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5 METHOD AND APPARATUS FOR PROVIDING A BIOINFORMATICS 

DATABASE 

CROSS-REFERENCE TO RELATED APPLICATIONS 

The present application claims priority from U.S. Prov. App. No. 60/053,842 
filed July 25, 1997, entitled COMPREHENSIVE BIO-INFORMATICS DATABASE, from 

10 U.S. Prov. App. No. 60/069,198 filed on December 11, 1997, entitled COMPREHENSIVE 
DATABASE FOR BIOINFORMATICS , and from U.S. Prov. App. No. 60/069,436, entitled 
GENE EXPRESSION AND EVALUATION SYSTEM, filed on December 11, 1997. The 
contents of all three provisional applications are herein incorporated by reference. 

The subject matter of the present application is related to the subject matter of 

1 5 the following three co-assigned applications filed on the same day as the present application. 
GENE EXPRESSION AND EVALUATION SYSTEM (Attorney Docket No. 01 8547- 
035010), METHOD AND SYSTEM FOR PROVIDING A POLYMORPHISM 
DATABASE (Attorney Docket No. 018547-033820), METHOD AND SYSTEM FOR 
PROVIDING A PROBE ARRAY CHIP DESIGN DATABASE (Attorney Docket No. 

20 018547-033830). The contents of these three applications are herein incorporated by 
reference. 

BACKGROUND OF THE INVENTION 

The present invention relates to the collection and storage of information 
25 pertaining to processing of biological samples. 

Devices and computer systems for forming and using arrays of materials on a 
substrate are known. For example, PCT application W092/1 0588, incorporated herein by 
reference for all purposes, describes techniques for sequencing or sequence checking nucleic 
acids and other materials. Arrays for performing these operations may be formed in arrays 
30 according to the methods of, for example, the pioneering techniques disclosed in U.S. Patent 



WO 99/055V1 



Ho. 5,143,S54 and U.S. Pa.en.No. 5,571,639, bom incorporaied herein by reference for al. 
PUn,0SeS ' * M »~^***^*« M * ab - mma °'' , 

Lasmemonomerseo.uenceofDNAorRNA. Such sysKms have been used . form or 

example, «ui j and other genetic charactenstics. 

^ r-u •« cene (relevant to certain cancers), HI V, ana oxner gc 

publication No. WO 97/10365. ft. — of ^ch are herem ,ncorpora.ed by reference. 
Cr^tesarecharacterizedby differencesin.be expression .eve, of vano. genes 
Many dwase ste.es ar m levds 

.5 eitherftroughchangestauiecopynutnberof«.egenencur. 

7 L<e<r through control of iniuaUon, provision of RNA precursors, RNA 

chip layou, mas* design, probe syndesis, sample preparanon, appl.cauon of samples 
is performed. Formar,, sfcges, mere is also resul. informal genera.* dunng 

convenient access and retrieval. 

Manyofftecomempla^dappUcauonsofprobearraychrpsmvolve 

^ormingaUof* various sages on a very large scale. For example, consider surveyng a 



I 



WO 99/05591 PCT/US98/15469 

3 

relevant to a particular form of cancer. Large numbers of samples must be collected and 
processed. Information about the sample donors and sample preparation condition should be 
maintained to facilitate later analysis. The probe array chips will have associated layout 
information. Each chip will be processed with samples and scanned individually. Each chip 
5 will thus have its own scanning results. Finally, the scanning results will be interpreted and 
analyzed for many subjects in an effort to identify the oncogenes and tumor suppressors. 
The quantity of information to store and correlate is vast. Compounding the information 
management problem, equipment and other laboratory resources may be shared with other 
projects. A single laboratory may service many clients, each client in turn requesting 
10 completion of multiple projects. What is needed is a system and method suitable for storing 
and organizing large quantities of information used in conjunction with probe array chips. 

SUMMARY OF THE INVENTION 

The present invention provides system and method for organizing information 

1 5 relating to polymer probe array chips including oligonucleotide array chips. A database 
model is provided which organizes information relating to sample preparation, chip layout, 
application of samples to chips, scanning of chips, expression analysis of chip results, etc. 
The model is readily translatable into database languages such as SQL. The database model 
scales to permit mass processing of probe array chips. 

20 According to a first aspect of the present invention, a computer-implemented 

method for managing information relating to processing of polymer probe arrays, includes a 
step of creating an electronically-stored experiment table. The experiment table lists for each 
of a plurality of experiments a first identifier identifying a target sample applied to an 
polymer probe array chip in a particular experiment, and a second identifier identifying the 

25 polymer probe array chip to which the target sample was applied in the particular 

experiment. The method further includes a step of creating an electronically-stored chip 
table. The chip table lists for each of a plurality of polymer probe array chips: the second 
identifier identifying a particular polymer probe array chip; and a third identifier specifying a 
layout of polymer probes on the oligonucleotide array chip. 

30 According to a second aspect of the present invention, a computer- 

implemented method for managing information relating to processing of oligonucleotide 
arrays, includes a step of creating an electronically stored analysis table. The analysis table 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Fig . 1 illustrates an overall system and process for forming and analyze 
25 arraysofbiologicalmaterialssuchasDNAorRNA. c „ Ae 
Fig. 2A illustrates a computer system suitable for use m conjunct 

overall system of Fig. 1 . conjunction with the 

Fig. 2B illustrates a computer network suitable for use m conjunc 

overall system of Fig. 1. 

Fig 3 illustrates a key for interpreting a database model. 

Fig 4illus*atesa^ 
access ofFig. 1 according to one embodiment ofthe present invention. 
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DESCRIPTION OF SPECIFIC EMBODIMENTS 
Biological Material Analysis System 

One embodiment of the present invention operates in the context of a system 
5 for analyzing biological or other materials using arrays that themselves include probes that 
may be made of biological materials such as RNA or DNA. The VLSIPS™ and GeneChip™ 
technologies provide methods of making and using very large arrays of polymers, such as 
nucleic acids, on very small chips. See U.S. Patent No. 5,143,854 and PCT Patent 
Publication Nos. WO 90/15070 and 92/10092, each of which is hereby incorporated by 

1 0 reference for all purposes. Nucleic acid probes on the chip are used to detect complementary 
nucleic acid sequences in a sample nucleic acid of interest (the "target" nucleic acid). 

It should be understood that the probes need not be nucleic acid probes but 
may also be other polymers such as peptides. Peptide probes may be used to detect the 
concentration of peptides, polypeptides, or polymers in a sample. The probes must be 

15 carefully selected to have bonding affinity to the compound whose concentration they are to 
be used to measure. 

Fig. 1 illustrates an overall system 100 for forming and analyzing arrays of 
biological materials such as RNA or DNA. At the center of system 100 is a bioinformatics 
database 102. Bioinformatics database 102 maintains information relevant to the various 

20 stages of forming and processing the arrays as well as to interpreting and analyzing the 
results. Bioinformatics database 102 facilitates large scale processing of arrays. 

A chip design system 104 is used to design arrays of polymers such as 
biological polymers such as RNA or DNA. Chip design system 104 may be, for example, an 
appropriately programmed Sun Workstation or personal computer or workstation, such as an 

25 IBM PC equivalent, including appropriate memory and a CPU. Chip design system 104 

obtains inputs from a user regarding chip design objectives including characteristics of genes 
of interest, and other inputs regarding the desired features of the array. Optionally, chip 
design system 104 may obtain information regarding a specific genetic sequence of interest 
from bioinformatics database 102 or from external databases such as GenBank. The output 

30 of chip design system 1 04 is a set of chip design computer files in the form of, for example, a 
switch matrix, as described in PCT application WO 92/10092, and other associated computer 
files. The chip design computer files form a part of bioinformatics database 102. Systems 
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sample. The prepared samples will be placed in a scanning system 118. Scanning system 
118 includes a detection device such as a confocal microscope or CCD (charge-coupled 
device) that is used to detect the location where labeled receptors have bound to the 
substrate. The output of scanning system 1 1 8 is an image file(s) indicating, in the case of 
5 fluorescein labeled receptor, the fluorescence intensity (photon counts or other related 

measurements, such as voltage) as a function of position on the substrate. These image files 
also form a part of bioinformatics database 102. Since higher photon counts will be 
observed where the labeled receptor has bound more strongly to the array of polymers, and 
since the monomer sequence of the polymers on the substrate is known as a function of 

10 position, it becomes possible to determine the sequence(s) of polymer(s) on the substrate that 
are complementary to the receptor. 

The image files and the design of the chips are input to an analysis system 
120 that, e.g., calls base sequences, or determines expression levels of genes or expressed 
sequence tags. The expression level of a gene or EST is herein understood to be the 

1 5 concentration within a sample of mRNA or protein that would result from the transcription 
of the gene or EST. Such analysis techniques are disclosed in WO97/10365 and U.S. App. 
No. 08/53 1,137, the contents of which are herein incorporated by reference. Analysis results 
are stored in bioinformatics database 102. 

Chip design system 104, analysis system 120 and control portions of 

20 exposure system 1 1 6, sample preparation system 114, and scanning system 118 may be 

appropriately programmed computers such as a Sun workstation or IBM-compatible PC. An 
independent computer for each system may perform the computer-implemented functions of 
these systems or one computer may combine the computerized functions of two or more 
systems. One or more computers may maintain bioinformatics database 102 independent of 

25 the computers operating the systems of Fig. 1 or database 102 may be fully or partially 
maintained by these computers. 

Fig. 2 A depicts a block diagram of a host computer system 10 suitable for 
implementing the present invention. Host computer system 210 includes a bus 212 which 
interconnects major subsystems such as a central processor 214, a system memory 216 

30 (typically RAM), an input/output (I/O) adapter 218, an external device such as a display 
screen 224 via a display adapter 226, a keyboard 232 and a mouse 234 via an I/O adapter 
21 8, a SCSI host adapter 236, and a floppy disk drive 238 operative to receive a floppy disk 
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240. SCSI host adapter 236 may act as a storage interface to a fixed disk drive 242 or a CD- 
ROM player 244 operative to receive a CD-ROM 246. Fixed disk 244 may be a part of host 
computer system 210 or may be separate and accessed through other interface systems. A 
network interface 248 may provide a direct connection to a remote server via a telephone 

5 link or to the Internet. Network interface 248 may also connect to a local area network 
(LAN) or other network interconnecting many computer systems. Many other devices or 
subsystems (not shown) may be connected in a similar manner. 

Also, it is not necessary for all of the devices shown in Fig. 2 A to be present 
to practice the present invention, as discussed below. The devices and subsystems may be 

1 0 interconnected in different ways from that shown in Fig. 2A. The operation of a computer 
system such as that show in Fig. 2A is readily known in the art and is not discussed in detail 
in this application. Code to implement the present invention, may be operably disposed or 
stored in computer-readable storage media such as system memory 216, fixed disk 242, CD- 
ROM 246, or floppy disk 240. 

15 Fig. 2B depicts a network 260 interconnecting multiple computer systems 

210. Network 260 may be a local area network (LAN), wide area network (WAN), etc. 
Bioinfonnatics database 102 and the computer-related operations of the other elements of 
Fig. 2B may be divided amongst computer systems 210 in any way with network 260 being 
used to communicate information among the various computers. Portable storage media 

20 such as floppy disks may be used to carry information between computers instead of 
network 260. 



Database General Model 

Bioinformatics database 102 is preferably a relational database with a 
25 complex internal structure. The structure and contents of bioinformatics database 102 will 
be described with reference to a logical model that describes the contents of tables of the 
database as well as interrelationships among the tables. A visual depiction of this model will 
be an Entity Relationship Diagram (ERD) which includes entities, relationships, and 
attributes. A detailed discussion of ERDs is found in "ERwin version 3.0 Methods Guide" 
30 available from Logic Works, Inc. of Princeton, NJ, the contents of which are herein 

incorporated by reference. Those of skill in the art will appreciate that automated tools such 
as Developer 2000 available from Oracle will convert the ERD from Fig. 4 directly into 
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executable code such as SQL code for creating and operating the database. 

Fig. 3 is a key to the ERD that will be used to describe the contents of 
bioinformatics database 102. An aggregation (or "has a") relationship 302 signifies that one 
entity has another entity. In the depicted example, a sequence set 304 has a sequence 306. A 
5 one to many association (or "classification") relationship 308 signifies that one entity defines 
an equivalence class of other entities. In the depicted example, a sample 310 defines an 
equivalence class of targets 312. A MetaClass relationship 314 signifies that a collection of 
one entity corresponds to another entity. In the depicted example, a collection of chips 316 
corresponds to a chip design 318. A specialization (or "is a") relationship 320 indicates that 
10 one entity is another entity. In the depicted example, a fragment 322 is a sequence 324. 

An instantiation relationship 326 signifies that one entity is an instance of a 
set of another entity. In the depicted example, K104-101 328 is an instance of the set of 
subjects 330. If instantiation leads to a set rather than a unique element, the set being 
instantiated is referred to as a metaclass. An associative object relationship 332 signifies that 
15 a subset of the cartesian product of a first set of entities and a second set of entities 

corresponds to a third set of entities. In the depicted example, a subject 334 participates in 
one or more subject groups 336 and each such subject participation 338 is an entity. 

Fig. 4 is an entity relationship diagram (ERD) showing elements of 
bioinformatics database 102 according to one embodiment of the present invention. 

20 Each rectangle in the diagram corresponds to a table in database 1 02. For 

each rectangle, the title of the table is listed above the rectangle. Within each rectangle, 
columns of the table are listed. Above a horizontal line within each rectangle are listed key 
columns, columns whose contents are used to identify individual records in the table. Below 
this horizontal line are the names of non-key columns. The lines between the rectangles 

25 identify the relationships between records of one table and records of another table. First, 
the relationships among the various tables will be described. Then, the contents of each table 
will be discussed in detail. 

Certain details of bioinformatics database 102 pertain to expression analysis, 
although other types of Analysis such as base calling and the discovery of polymorphisms 
30 may also be facilitated according to the present invention. 

An experiment table 402 lists experiments performed on a target using a 
particular physical chip and is done according to a protocol. Targets are listed in a target 
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For each parameter listed in parameter template table 420, there is a unit of 
measurement for that parameter. Thus, a parameter units table 430 has a one to many 
relationship 432 to parameter template table 420. 

For each target record in a target table 404 there is a target type record in a 
5 target type table 434. The target type records identifies a type of target source, such as 
blood, saliva, etc. There is a one to many association relationship 436 between target table 
404 and target type table 434. 

An analysis is carried out on an analysis data set collection according to a 
protocol and according to an analysis scheme. Thus, there is an analysis table 438, an 

10 analysis data set collection table 440, an analysis scheme table 442. There is an aggregation 
relationship 444 between protocol table 408 and analysis table 438, a one to many 
association relationship 446 between analysis data sent collection table 440 and analysis 
table 438, and an aggregation relationship 448 between analysis scheme table 442 and 
analysis table 438. A protocol for analysis is analogous to the protocols used for 

1 5 experiments and target preparation. 

An analysis scheme record gives the logical layout of a chip type. A logical 
layout consists of a hierarchical assembly of units, blocks, atoms, and cells, each of which is 
detailed in a separate table. There may be more than one logical layout for a particular 
physical chip design because the same collection of probes of a single physical chip design 
20 may be usable for disparate analysis objectives. 

There is a chip design table 450 that has a one to many association 
relationship 452 to physical chip table 412. The records of chip design table 450 identify a 
physical chip layout. There is also a one to many association relationship 454 between chip 
design table 450 and analysis scheme 442 to represent the possibility of multiple logical 
25 layouts for a particular physical layout. 

A scheme unit table 456 lists records for units of the logical layout. A unit is 
a collection of probes that interrogate one or more biological items such as genes. There is a 
one to many relationship 458 between analysis scheme table 442 and scheme unit table 456. 
Each unit has an associated unit type listed in a unit type table 460 with a one to many 
30 association relationship 462 existing between unit type table 460 and scheme unit table 456. 

A scheme block table 464 lists records for blocks of the logical layout. Although a 
one to many associative relationship 466 exists between scheme unit table 456 and scheme 
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many association relationship 492. 

Preferably, there are three result tables, an absolute gene expression result 
table 494, a relative gene expression result table 496, and a measurement element table 498. 
Each analysis may produce one or more absolute gene expression results, relative gene 
5 expression results, or measurement element results. Thus, there are one to many association 
relationships 500, 502, and 504 linking analysis table 438 to absolute gene expression table 
494, relative gene expression table 496, and measurement element table 498 respectively. 

A biological reference table 506 lists gene names. Each record in absolute 
gene expression result table 494 and relative gene expression result table 496 corresponds to 
10 a particular gene. Accordingly, there is a one to many associative relationship 508 between 
biological reference table 506 and absolute gene expression result table 494 and another such 
relationship 510 between biological reference table 506 and relative gene expression result 
table 416. There is also a one to many associative relationship 512 between biological 
reference table 406 and scheme block table 464 because each listed block corresponds to a 
1 5 particular named gene. 

An absolute gene expression result type table 514 lists the types of absolute 
gene expression results including present, marginal, absent, and unknown. There is a one to 
many relationship 516 between absolute gene expression result type table 514 and absolute 
gene expression result table 494. A relative gene expression result table 518 lists the types 
20 of relative gene expression results including increased, no change, decreased, and unknown. 
There is a one to many relationship 520 between relative gene expression result type table 
518 and relative gene expression result table 496. 

Database Contents 

The contents of the tables introduced above will now be presented in greater 
25 detail. It is to be understood that each table includes multiple records with each record 

having multiple fields corresponding to columns of the table. Experiment table 402 includes 
one record for each experiment run. An ID column is the primary key for experiment table 
402 holding a unique identifier for each experiment. In describing the other tables, it will be 
understood that the "primary key" always serves this purpose. A protocol ID column 
30 identifies the protocol used for the experiment as listed in protocol table 408. A target ID 
column identifies the target sample used in the experiment as listed in target table 404. A 
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column. There is a collection ID column which identifies which data set collection each data 
set belongs to as listed in analysis data set collection table 440. An analysis ID column 
identifies the analysis used to produce the data set, if the data set is in fact the product of an 
analysis. An experiment ID column identifies the experiment used to produce the data set, if 
5 the data set is instead the product of an experiment. A type ID column indicates whether the 
data set is the product of an experiment or an analysis. 

Analysis data set type table 482 lists the types of analysis data sets, preferably 
"experiment" and "analysis" to indicate the data source. There is a primary key column and 
a name column giving the type name. 

10 Analysis algorithm table 486 lists algorithms used for analysis. There is a 

primary key column and a name column giving an algorithm name. A type column indicates 
whether the algorithm produces absolute gene expression results, relative gene expression 
results, or results for a particular cell on the chip. 

Algorithm type table 490 lists the types of algorithm results. There is a 

1 5 primary key column and a type column listing the different result types used in the type 
column of analysis algorithm table 486. 

Measurement element table 498 lists analysis results for individual cells or 
probes. There is an analysis ID column identifying the analysis listed in analysis table 438 
that produces the results listed in measurement elements table 498. There are location X and 

20 location Y columns giving the probe coordinates on the chip. The analysis ID, location X, 
and location Y columns are together a key for measurement element table 498. There is an 
intensity column which holds a calculated average fluorescent intensity for each cell or 
probe. A statistic column gives a standard deviation corresponding to the standard deviation 
of intensity measured over the probes. A pixels column lists the number of pixels used to 

25 compute the average intensities in the intensity column. A flag column stores a three bit flag 
for each individual cell analysis result. The first bit is set if the cell has been masked out of 
the analysis indicated in the analysis ID column and that the intensity and statistic columns 
therefore hold inapplicable data. A second bit indicates whether the analysis has determined 
the cell to be an outlier with results inconsistent with other cells. A third bit indicates if the 

30 cell intensity has modified compared to the value based on experimental measurements. An 
original intensity column lists the cell intensity if it has been modified, otherwise the entry in 
this column is set to "1 ." 
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expression analyses. A comparative analysis is based on experiment results obtained from 
experiments on two targets: a baseline target and an experimental target. For example, the 
baseline target may be made from normal tissue while the experimental target may be made 
from cancerous tissue. Other tissue types used as targets may correspond to different stages 
5 of treatment or disease progression, different species, or different organs. 

An analysis ID column identifies the analysis as listed in analysis table 438 
that produced the relative gene expression results. An item ID column identifies the gene as 
listed in biological reference table 506 for which results are stored. The analysis ID and item 
ID together constitute a primary key for relative gene expression result table 496. A result 

10 type ID column indicates whether the listed relative expression results indicate increased 
expression, no change in expression, decreased expression, or an unknown change in 
expression by referring to entries in relative gene expression result type table 518. A 
positive pairs ratio column lists the ratio of the numbers of positive probe pairs between the 
two targets. A positive increase column indicates the number of probe pairs for which the 

1 5 difference between perfect match and mismatch hybridization intensities is significantly 

greater for the experimental target A positive delta column indicates the difference between 
the number of positive probe pairs between the two targets. A negative pairs ratio column 
lists the ratio of the numbers of negative probe pairs for the two targets. A negative increase 
column indicates the number of probe pairs for which the difference between perfect match 

20 and mismatch hybridization intensities is significantly greater for the baseline target. A 
negative delta column indicates the difference between the number of negative probe pairs 
between the two targets. An average ratio delta column indicates the difference between 
average log ratios for the experimental and baseline targets. An average intensity difference 
delta column indicates the difference between the average intensity differences for the 

25 experimental and baseline targets. An average difference ratio column indicates the 

magnitude of the ratio of the average differences for the experimental and baseline targets. 
A log average ratio delta column indicates the difference between the log average ratios of 
the experimental and baseline targets. A significance columns provides an indication of the 
differences in expression between the experimental and baseline targets. This significance 

30 column is based oh both the average difference ratio and the average intensity difference 
delta. A base absent column indicates whether the gene in question is seemingly not 
expressed in the baseline target. A difference call column (not shown) indicates whether the 
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Analysis scheme table 442 lists logical layouts for chip types. A logical 
layout consists of a hierarchical assembly of units, blocks, atoms, and cells. There is a 
primary key column. A chip design ID column identifies the chip type for each logical 
layout. The same chip type may have more than one logical layout 
5 Unit type table 460 lists various types of units that make up a logical layout. 

There is a primary key column and a name column listing unique names for each unit type. 

Scheme unit table 456 stores a record for each unit in the logical layout. 
There is a scheme ID column identifying the logical layout with which the unit is associated. 
There is a unit index column giving an index number for the unit ranging from 1 to the total 

10 number of units on the chip. The scheme ID column and unit index column together operate 
as a key to scheme unit table 456. There is a type ID column giving the unit type for each 
unit. A name column gives a name for each unit. A direction column indicates whether the 
unit interrogates in a coding or non-coding direction, i.e., whether the sample contains 
sequence from the sense DNA strand or the anti-sense DNA strand. 

15 Scheme block table 464 stores a record for each block. Each block of the 

logical layout interrogates the activity of a single gene. There is a scheme ID column 
indicating the logical layout to which the block belongs. A unit index column indicates the 
unit to which the block belongs. A block index column gives an index number for the block, 
ranging from 1 to the number of blocks in the unit. The scheme ID, unit index, and block 

20 index together constitute a primary key for scheme block table 464. An item ID column 
identifies the interrogated gene by reference to biological reference table 506. 

Scheme atom table 468 lists records for every atom of the logical layout. 
Atoms correspond to pairs of perfect match and mismatch probes. A scheme ID column 
identifies the logical layout to which the atom belongs. A unit index column indicates the 

25 unit to which the atom belongs. A block index column indicates the block to which the atom 
belongs. An atom index column gives an index number for the atom ranging from 1 to the 
number of atoms in the block. Together, the scheme ID, unit index, block index, and atom 
index constitute a key to scheme atom table 468. A position column indicates the sequence 
position in which the perfect match and mismatch probe differ. A T-base column indicates 

30 the base in the mismatch probe at the substitution position. An atom number column gives 
position information for the probe pair within its unit. 

Scheme cell table 472 lists records for every cell of the logical layout. Cells 
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WHAT IS CLAIMJEP IS: 

1 1 . A computer-implemented method for managing information 

2 relating to processing of polymer probe arrays, said method comprising the steps 

3 of: 

4 creating an electronically-stored experiment table, said experiment 

5 table storing a record for an experiment, said experiment record comprising: 

6 a first identifier identifying a target sample applied to a 

7 polymer probe array chip in said experiment; 

8 a second identifier identifying said polymer probe array chip 

9 to which said target sample was applied in said experiment; and 

10 creating an electronically-stored chip table, said chip table storing a 

1 1 record for said polymer probe array chip, said chip record comprising: 

12 said second identifier identifying said polymer probe array 

13 chip; and 

14 a third identifier specifying a layout of polymer probes on 

1 5 said polymer probe array chip. 

1 2. The method of claim Al further comprising the step of: 

2 performing an experiment wherein said target sample is applied to 

3 said polymer probe array chip. 

1 3. The method of claim Al further comprising the steps of: 

2 creating an electronically-stored target table, said target table storing 

3 a record for said target sample, said target sample record comprising: 

4 said first identifier identifying said target sample; and 

5 a fourth identifier specifying parameters of preparation of 

6 said target sample* 

1 4. The method of claim Al wherein said polymer probe array 

2 chip comprises an oligonucleotide array chip. 



PCT/US98/15469 

WO 99/05591 

22 

! 5. A computer-implemented method for managing information 

2 relating to processing of oligonucleotide probe arrays, said method comprising the 

3 steps of: 

creating an electronically stored analysis table, said analysis table 
listing for each of a plurality of expression analysis operations: 

a first identifier specifying a particular analysis operation 

7 a second identifier specifying oligonucleotide array 

8 processing result information on which said particular expression analysis 

9 operation has been performed; and 

10 creating an electronically stored gene expression result table, said 

1 1 gene expression result table listing for each of selected ones of said plurality of 

12 analysis operations, a list of genes or expressed sequence tags and results of said 

1 3 particular expression analysis operation as applied to each of said genes or 

14 expressed sequence tags. 

1 6. A computer-implemented method for managing information 

2 relating to processing of polymer probe arrays, said method comprising the steps 

3 of: 

4 storing in an electronically-stored experiment table for each of a 

5 plurality of experiments, a first identifier identifying a target sample applied to an 

6 polymer probe array chip in a particular experiment; 

7 storing in said electronically-stored experiment table for each of said 

8 plurality of experiments a second identifier identifying said polymer probe array 

9 chip to which said target sample was applied in said particular experiment; 

10 storing in an electronically-stored chip table for each of a plurality 

1 1 of polymer probe array chips, said second identifier identifying a particular 

12 polymer probe array chip; and 

13 storing in said electronically-stored chip table for each of said 

14 plurality of polymer probe arrays chips a third identifier specifying a layout of 

1 5 polymer probes on said'polymer probe array chip. 
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1 7. The method of claim 6 further comprising the steps of: 

2 storing in an electronically-stored target table, for each of a plurality 

3 of target samples, said first identifier identifying a particular target sample; and 

4 storing in said electronically-stored target table, for each of said 

5 plurality of target samples, a fourth identifier specifying parameters of preparation 

6 of said particular target sample. 

1 8. The method of claim 6 wherein said polymer probe array 

2 chip comprises an oligonucleotide array chip. 

1 9. A computer-readable storage medium having stored thereon: 

2 code for creating an electronically-stored experiment table, said 

3 experiment table listing for each of a plurality of experiments: 

4 a first identifier identifying a target sample applied to an 

5 oligonucleotide array chip in a particular experiment; 

6 a second identifier identifying said oligonucleotide an-ay chip to 

7 which said target sample was applied in said particular experiment; and 

8 code for creating an electronically-stored chip table, said chip table 

9 listing for each of a plurality of oligonucleotide array chips: 

10 said second identifier identifying said particular 

1 1 oligonucleotide array chip; and 

1 2 a third identifier specifying a layout of oligonucleotide 

13 probes on said particular oligonucleotide array chip. 

1 10. The computer-readable storage medium of claim 9 having 

2 further stored thereon: 

3 code for creating an electronically-stored target table, said target 

4 table listing records comprising: 

5 said first identifier identifying said target sample for one or 

6 more of said plurality of experiments; and 

7 a fourth identifier specifying parameters of preparation of 

8 said target sample for one or more of said plurality of experiments. 
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1 1 1 . A computer-readable storage medium having stored thereon: 

2 an electronically-stored experiment table, said experiment table 

3 listing for each of a plurality of experiments: 

4 a first identifier identifying a target sample applied to an 

5 oligonucleotide array chip in a particular experiment; 

6 a second identifier identifying said oligonucleotide array chip to 

7 which said target sample was applied in said particular experiment; and 

g an electronically-stored chip table, said chip table listing for each of 

9 a plurality of oligonucleotide array chips: 

j 0 said second identifier identifying a particular oligonucleotide 

1 1 array chip; and 

! 2 a third identifier specifying a layout of oligonucleotide 

1 3 probes on said particular oligonucleotide array chip. 

1 1 2. A computer-readable storage medium for managing 

2 information relating to processing of oligonucleotide arrays, said storage medium 

3 having stored thereon: 

4 code for creating an electronically stored analysis table, said 

5 analysis table listing for each of a plurality of expression analysis operations: 

6 a first identifier specifying a particular analysis operation 

7 a second identifier specifying oligonucleotide array 

8 processing result information on which said particular expression analysis 

9 operation has been performed; and 

j o code for creating an electronically stored gene expression result 

1 1 table, said gene expression result table listing for each of selected ones of said 

1 2 plurality of analysis operations, a list of genes and results of said particular 

1 3 expression analysis operation as applied to each of said genes. 
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13. A computer readable storage medium having stored thereon: 
an analysis table, said analysis table listing for each of a plurality of 

expression analysis operations: 

a first identifier specifying a particular analysis operation 
a second identifier specifying oligonucleotide array 

processing result information on which said particular expression analysis 

operation has been performed; and 

a gene expression result table, said gene expression result table 

listing for each of selected ones of said plurality of analysis operations, a list of 

genes and results of said particular expression analysis operation as applied to each 

of said genes. 
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